Speed learning on the fly
نویسندگان
چکیده
The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data. Introduction This work aims at improving gradient ascent procedures for use in machine learning contexts, by adapting the step size of the descent as it goes along. Let `0, `1, . . . , `t, . . . be functions to be maximised over some parameter space Θ. At each time t, we wish to compute or approximate the parameter θ∗ t ∈ Θ that maximizes the sum Lt(θ) := ∑ s≤t `s(θ). (1) In the experiments below, as in many applications, `t(θ) writes `(xt, θ) for some data x0, x1, . . . , xt, . . . A common strategy, especially with large data size or dimensionality [Bot10], is the online stochastic gradient ascent (SG) θt+1 = θt + η ∂θ`t(θt) (2) with step size η, where ∂θ`t stands for the Euclidean gradient of `t with respect to θ. Such an approach has become a mainstay of both the optimisation and machine learning communities [Bot10]. Various conditions for convergence exist, starting with the celebrated article of Robbins and Monro [RM51], or later [KC78]. Other types of results are proved in convex settings, Several variants have since been introduced, in part to improve the convergence of the algorithm, which is much slower in stochastic than than in
منابع مشابه
An Initial Study on the Coordination of Rod and Line Hauling Movements in Distance Fly Casting
Background. The double haul is a unique feature of single-handed fly casting and is used in both fly fishing and fly casting competition. The movement behaviour during the double haul has not been investigated in previous research. Objectives. Describe the coordination of the rod and line hauling movements during distance fly casting. Methods. Elite fly casters performed distance castin...
متن کاملProviding a Bird Swarm Algorithm based on Classical Conditioning Learning Behavior and Comparing this Algorithm with sinDE, JOA, NPSO and D-PSO-C Based on Using in Nanoscience
There can be no doubt that nanotechnology will play a major role in our futuretechnology. Computer science offers more opportunities for quantum andnanotechnology systems. Soft Computing techniques such as swarm intelligence, canenable systems with desirable emergent properties. Optimization is an important anddecisive activity in structural designing. The inexpensive re...
متن کاملNew Analytic Method for Subgrade Settlement Calculation of the New Cement Fly-ash Grave Pile-slab Structure
At present, reducing subgrade settlement of soft soil foundation is a key problem in high-speed railway construction. Pile-slab structure is a widely-utilized form of foundation structure to reduce the subgrade settlement in China. In order to save the engineering cost for high-speed railway construction in developing countries, the author developed a pile-slab structure and named it as the new...
متن کاملSoft Foundation Strengthening Effect and Structural Optimization of a New Cement Fly-ash and Gravel Pile-slab Structure
Reducing the settlements of soft foundation effectively is a critical problem of high-speed railway construction in China. The new CFG pile-slab structure composite foundation is a ground treatment technique which is applied on CFG pile foundation and pile-slab structure composite foundation. Based on the experience of constructing Beijing-Shanghai high-speed railway in China, the settlement-co...
متن کاملFrom Traditional Neural Networks to Deep Learning: Towards Mathematical Foundations of Empirical Successes
How do we make computers think? To make machines that fly, it is reasonable to look at the creatures that know how to fly: the birds. To make computers think, it is reasonable to analyze how we think – this is the main origin of neural networks. At first, one of the main motivations was speed – since even with slow biological neurons, we often process information fast. The need for speed motiva...
متن کاملOn the convergence speed of artificial neural networks in the solving of linear systems
Artificial neural networks have the advantages such as learning, adaptation, fault-tolerance, parallelism and generalization. This paper is a scrutiny on the application of diverse learning methods in speed of convergence in neural networks. For this aim, first we introduce a perceptron method based on artificial neural networks which has been applied for solving a non-singula...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1511.02540 شماره
صفحات -
تاریخ انتشار 2015